Medium vocabulary continuous audio-visual speech recognition
نویسندگان
چکیده
This paper presents our experiments on continuous audiovisual speech recognition. A number of bimodal systems using feature fusion or fusion within Hidden Markov Models are implemented. Experiments with different fusion techniques and their results are presented. Further the performance levels of the bimodal system and a unimodal speech recognizer under noisy conditions are compared.
منابع مشابه
Audio - Visual Speech Recognition
We have made signi cant progress in automatic speech recognition (ASR) for well-de ned applications like dictation and medium vocabulary transaction processing tasks in relatively controlled environments. However, for ASR to approach human levels of performance and for speech to become a truly pervasive user interface, we need novel, nontraditional approaches that have the potential of yielding...
متن کاملLarge-vocabulary audio-visual speech recognition by machines and humans
We compare automatic recognition with human perception of audio-visual speech, in the large-vocabulary, continuous speech recognition (LVCSR) domain. Specifically, we study the benefit of the visual modality for both machines and humans, when combined with audio degraded by speech-babble noise at various signal-to-noise ratios (SNRs). We first consider an automatic speechreading system with a p...
متن کاملAsynchronous stream modeling for large vocabulary audio-visual speech recognition
This paper addresses the problem of audio-visual information fusion to provide highly robust speech recognition. We investigate methods that make different assumptions about asynchrony and conditional dependence across streams and propose a technique based on composite HMMs that can account for stream asynchrony and different levels of information integration. We show how these models can be tr...
متن کاملLarge-vocabulary audio-visual speech recognition: a summary of the Johns Hopkins Summer 2000 Workshop
We report a summary of the Johns Hopkins Summer 2000 Workshop on audio-visual automatic speech recognition (ASR) in the large-vocabulary, continuous speech domain. Two problems of audio-visual ASR were mainly addressed: Visual feature extraction and audio-visual information fusion. First, image transform and model-based visual features were considered, obtained by means of the discrete cosine t...
متن کاملDct-based Video Features for Audio-v
Encouraged by the good performance of the DCT in audiovisual speech recognition [1], we investigate how the selection of the DCT coefficients influences the recognition scores in a hybridANN/HMM audio-visual speech recognition system on a continuous word recognition task with a vocabulary of 30 numbers. Three sets of coefficients, based on the mean energy, the variance and the variance relative...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2002